Advances in Intelligent Data Analysis VI

chapter

Back Matter

Lecture Notes in Computer Science > Advances in Intelligent Data Analysis VI

chapter

Probabilistic Latent Clustering of Device Usage

Jean-Marc Andreoli, Guillaume Bouchard

Lecture Notes in Computer Science > Advances in Intelligent Data Analysis VI > 1-11

We investigate an application of Probabilistic Latent Semantics to the problem of device usage analysis in an infrastructure in which multiple users have access to a shared pool of devices delivering different kinds of service and service levels. Each invocation of a service by a user, called a job, is assumed to be logged simply as a co-occurrence of the identifier of the user and that of the device...

chapter

Condensed Nearest Neighbor Data Domain Description

Fabrizio Angiulli

Lecture Notes in Computer Science > Advances in Intelligent Data Analysis VI > 12-23

A popular method to discriminate between normal and abnormal data is based on accepting test objects whose nearest neighbors distances in a reference data set lie within a certain threshold. In this work we investigate the possibility of using as reference set a subset of the original data set. We discuss relationship between reference set size and generalization, and show that finding the minimum...

chapter

Balancing Strategies and Class Overlapping

Gustavo E. A. P. A. Batista, Ronaldo C. Prati, Maria C. Monard

Lecture Notes in Computer Science > Advances in Intelligent Data Analysis VI > 24-35

Several studies have pointed out that class imbalance is a bottleneck in the performance achieved by standard supervised learning systems. However, a complete understanding of how this problem affects the performance of learning is still lacking. In previous work we identified that performance degradation is not solely caused by class imbalances, but is also related to the degree of class overlapping...

chapter

Modeling Conditional Distributions of Continuous Variables in Bayesian Networks

Barry R. Cobb, Rafael Rumí, Antonio Salmerón

Lecture Notes in Computer Science > Advances in Intelligent Data Analysis VI > 36-45

The MTE (mixture of truncated exponentials) model was introduced as a general solution to the problem of specifying conditional distributions for continuous variables in Bayesian networks, especially as an alternative to discretization. In this paper we compare the behavior of two different approaches for constructing conditional MTE models in an example taken from Finance, which is a domain were...

chapter

Kernel K-Means for Categorical Data

Julia Couto

Lecture Notes in Computer Science > Advances in Intelligent Data Analysis VI > 46-56

Clustering categorical data is an important and challenging data analysis task. In this paper, we explore the use of kernel K-means to cluster categorical data. We propose a new kernel function based on Hamming distance to embed categorical data in a constructed feature space where the clustering is conducted. We experimentally evaluated the quality of the solutions produced by kernel K-means on real...

chapter

Using Genetic Algorithms to Improve Accuracy of Economical Indexes Prediction

Óscar Cubo, Víctor Robles, Javier Segovia, Ernestina Menasalvas

Lecture Notes in Computer Science > Advances in Intelligent Data Analysis VI > 57-65

All sort of organizations needs as many information about their target population. Public datasets provides one important source of this information. However, the use of these databases is very difficult due to the lack of cross-references. In Spain, two main public databases are available: Population and Housing Censuses and Family Expenditure Surveys. Both of them are published by Spanish...

chapter

A Distance-Based Method for Preference Information Retrieval in Paired Comparisons

Esther Dopazo, Jacinto González-Pachón, Juan Robles

Lecture Notes in Computer Science > Advances in Intelligent Data Analysis VI > 66-73

The pairwise comparison method is an interesting technique for assessing priority weights for a finite set of objects. In fact, some web search engines use this inference tool to quantify the importance of a set of web sites. In this paper we deal with the problem of incomplete paired comparisons. Specifically, we focus on the problem of retrieving preference information (as priority weights) from...

chapter

Knowledge Discovery in the Identification of Differentially Expressed Genes in Tumoricidal Macrophage

A. Fazel Famili, Ziying Liu, Pedro Carmona-Saez, Alaka Mullick

Lecture Notes in Computer Science > Advances in Intelligent Data Analysis VI > 74-85

High-throughput microarray data are extensively produced to study the effects of different treatments on cells and their behaviours. Understanding this data and identifying patterns of groups of genes that behave differently or similarly under a set of experimental conditions is a major challenge. This has motivated researchers to consider multiple methods to identify patterns in the data and study...

chapter

Searching for Meaningful Feature Interactions with Backward-Chaining Rule Induction

Doug Fisher, Mary Edgerton, Lianhong Tang, Lewis Frey, more

Lecture Notes in Computer Science > Advances in Intelligent Data Analysis VI > 86-96

Exploring the vast number of possible feature interactions in domains such as gene expression microarray data is an onerous task. We propose Backward-Chaining Rule Induction (BCRI) as a semi-supervised mechanism for biasing the search for plausible feature interactions. BCRI adds to a relatively limited tool-chest of hypothesis generation software, and it can be viewed as an alternative to purely...

chapter

Exploring Hierarchical Rule Systems in Parallel Coordinates

Thomas R. Gabriel, A. Simona Pintilie, Michael R. Berthold

Lecture Notes in Computer Science > Advances in Intelligent Data Analysis VI > 97-108

Rule systems have failed to attract much interest in large data analysis problems because they tend to be too simplistic to be useful or consist of too many rules for human interpretation. We recently presented a method that constructs a hierarchical rule system, with only a small number of rules at each level of the hierarchy. Lower levels in this hierarchy focus on outliers or areas of the feature...

chapter

Bayesian Networks Learning for Gene Expression Datasets

Giacomo Gamberoni, Evelina Lamma, Fabrizio Riguzzi, Sergio Storari, more

Lecture Notes in Computer Science > Advances in Intelligent Data Analysis VI > 109-120

DNA arrays yield a global view of gene expression and can be used to build genetic networks models, in order to study relations between genes. Literature proposes Bayesian network as an appropriate tool for develop similar models. In this paper, we exploit the contribute of two Bayesian network learning algorithms to generate genetic networks from microarray datasets of experiments performed on Acute...

chapter

Pulse: Mining Customer Opinions from Free Text

Michael Gamon, Anthony Aue, Simon Corston-Oliver, Eric Ringger

Lecture Notes in Computer Science > Advances in Intelligent Data Analysis VI > 121-132

We present a prototype system, code-named Pulse, for mining topics and sentiment orientation jointly from free text customer feedback. We describe the application of the prototype system to a database of car reviews. Pulse enables the exploration of large quantities of customer free text. The user can examine customer opinion “at a glance” or explore the data at a finer level of detail. We describe...

chapter

Keystroke Analysis of Different Languages: A Case Study

Daniele Gunetti, Claudia Picardi, Giancarlo Ruffo

Lecture Notes in Computer Science > Advances in Intelligent Data Analysis VI > 133-144

Typing rhythms are one of the rawest form of data stemming from the interaction between humans and computers. When properly analyzed, they may allow to ascertain personal identity. In this paper we provide experimental evidence that the typing dynamics of free text can be used for user identification and authentication even when typing samples are written in different languages. As a consequence,...

chapter

Combining Bayesian Networks with Higher-Order Data Representations

Elias Gyftodimos, Peter A. Flach

Lecture Notes in Computer Science > Advances in Intelligent Data Analysis VI > 145-156

This paper introduces Higher-Order Bayesian Networks, a probabilistic reasoning formalism which combines the efficient reasoning mechanisms of Bayesian Networks with the expressive power of higher-order logics. We discuss how the proposed graphical model is used in order to define a probability distribution semantics over particular families of higher-order terms. We give an example of the application...

chapter

Removing Statistical Biases in Unsupervised Sequence Learning

Yoav Horman, Gal A. Kaminka

Lecture Notes in Computer Science > Advances in Intelligent Data Analysis VI > 157-167

Unsupervised sequence learning is important to many applications. A learner is presented with unlabeled sequential data, and must discover sequential patterns that characterize the data. Popular approaches to such learning include statistical analysis and frequency based methods. We empirically compare these approaches and find that both approaches suffer from biases toward shorter sequences, and...

chapter

Learning from Ambiguously Labeled Examples

Eyke Hüllermeier, Jürgen Beringer

Lecture Notes in Computer Science > Advances in Intelligent Data Analysis VI > 168-179

Inducing a classification function from a set of examples in the form of labeled instances is a standard problem in supervised machine learning. In this paper, we are concerned with ambiguous label classification (ALC), an extension of this setting in which several candidate labels may be assigned to a single example. By extending three concrete classification methods to the ALC setting and evaluating...

chapter

Learning Label Preferences: Ranking Error Versus Position Error

Eyke Hüllermeier, Johannes Fürnkranz

Lecture Notes in Computer Science > Advances in Intelligent Data Analysis VI > 180-191

We consider the problem of learning a ranking function, that is a mapping from instances to rankings over a finite number of labels. Our learning method, referred to as ranking by pairwise comparison (RPC), first induces pairwise order relations from suitable training data, using a natural extension of so-called pairwise classification. A ranking is then derived from a set of such relations by means...

chapter

FCLib: A Library for Building Data Analysis and Data Discovery Tools

Wendy S. Koegler, W. Philip Kegelmeyer

Lecture Notes in Computer Science > Advances in Intelligent Data Analysis VI > 192-203

In this paper we describe a data analysis toolkit constructed to meet the needs of data discovery in large scale spatio-temporal data. The toolkit is a C library of building blocks that can be assembled into data analyses. Our goals were to build a toolkit which is easy to use, is applicable to a wide variety of science domains, supports feature-based analysis, and minimizes low-level processing....

chapter

A Knowledge-Based Model for Analyzing GSM Network Performance

Pasi Lehtimäki, Kimmo Raivio

Lecture Notes in Computer Science > Advances in Intelligent Data Analysis VI > 204-215

In this paper, a method to analyze GSM network performance on the basis of massive data records and application domain knowledge is presented. The available measurements are divided into variable sets describing the performance of the different subsystems of the GSM network. Simple mathematical models for the subsystems are proposed. The model parameters are estimated from the available data record...

INFONA - science communication portal

Advances in Intelligent Data Analysis VI
6th International Symposium on Intelligent Data Analysis, IDA 2005, Madrid, Spain, September 8-10, 2005. Proceedings

Back Matter

Probabilistic Latent Clustering of Device Usage

Condensed Nearest Neighbor Data Domain Description

Balancing Strategies and Class Overlapping

Modeling Conditional Distributions of Continuous Variables in Bayesian Networks

Kernel K-Means for Categorical Data

Using Genetic Algorithms to Improve Accuracy of Economical Indexes Prediction

A Distance-Based Method for Preference Information Retrieval in Paired Comparisons

Knowledge Discovery in the Identification of Differentially Expressed Genes in Tumoricidal Macrophage

Searching for Meaningful Feature Interactions with Backward-Chaining Rule Induction

Exploring Hierarchical Rule Systems in Parallel Coordinates

Bayesian Networks Learning for Gene Expression Datasets

Pulse: Mining Customer Opinions from Free Text

Keystroke Analysis of Different Languages: A Case Study

Combining Bayesian Networks with Higher-Order Data Representations

Removing Statistical Biases in Unsupervised Sequence Learning

Learning from Ambiguously Labeled Examples

Learning Label Preferences: Ranking Error Versus Position Error

FCLib: A Library for Building Data Analysis and Data Discovery Tools

A Knowledge-Based Model for Analyzing GSM Network Performance

Filter options

Publication date

Publication language

INFONA - science communication portal

Advances in Intelligent Data Analysis VI 6th International Symposium on Intelligent Data Analysis, IDA 2005, Madrid, Spain, September 8-10, 2005. Proceedings $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication language

Reporting an error / abuse

Sending the report failed

Accessibility options

Advances in Intelligent Data Analysis VI
6th International Symposium on Intelligent Data Analysis, IDA 2005, Madrid, Spain, September 8-10, 2005. Proceedings